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ABSTRACT 

We consider estimation of a sparse parameter vector that determines 
the covariance matrix of a Gaussian random vector via a sparse ex- 
pansion into known "basis matrices." Using the theory of reproduc- 
ing kernel Hilbert spaces, we derive lower bounds on the variance 
of estimators with a given mean function. This includes unbiased 
estimation as a special case. We also present a numerical compari- 
son of our lower bounds with the variance of two standard estimators 
(hard-thresholding estimator and maximum likelihood estimator). 

Index Terms — Sparsity, sparse covariance estimation, variance 
bound, reproducing kernel Hilbert space, RKHS. 

1. INTRODUCTION 

We consider a Gaussian signal vector s £ R*^, s ~ A/'(/i, C) embed- 
ded in white Gaussian noise n ~ N{0, a^\). The observed vector is 



y = s + n, 



(1) 



where s and n are independent and the signal mean /x and noise 
variance are known. In what follows, we assume /i = since a 
nonzero /i can always be subtracted from s. The signal covariance 
matrix C is unknown; we will parameterize it according to 



= C(x) ^ ^a;fcCfc, 



(2) 



with unknown nonrandom coefficients Xk > and known positive 
semidefinite "basis matrices" Cfe. Thus, estimation of the signal 
covariance matrix C reduces to estimation of the coefficient vector 
(xi,...,a;jv)^GRy. 
Our central assumption is that x is S-sparse, i.e., at most S coef- 
ficients Xk are nonzero. We can formulate this as 



xG ^-5,+ ^ {x'gR^|||x'||(,<S} 



(3) 



The sparsity degree 5* is supposed known; however, the set of po- 
sitions of the nonzero entries of x (denoted by supp(x); note that 
|supp(x)| — ||x||(, < S) is unknown. Typically, S <^ N. We will 
refer to l|T)-l|3j as the sparse covariance model (SCM). The SCM 
and estimation of x are relevant, e.g., in time-frequency (TF) anal- 
ysis 1 1,2], where the basis matrices correspond to disjoint TF 
regions and Xk represents the mean signal power in the fcth TF re- 
gion. An application is cognitive radio scene analysis |3|. 

The problem we will study is estimation of z = g(x) G R^ 
from y, where g( ) is a known function. This includes estimation 
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of x and, less trivially, of a linear combination of the Xk- In the TF 
application mentioned above, the latter case corresponds to a linear 
combination of the mean signal powers in the various TF regions. 

In this paper, building on |4, 5 |, we use the theory of reproduc- 
ing kernel Hilbert spaces (RKHS) to derive lower bounds on the 
variance of estimators of z. The estimators are required to have a 
prescribed differentiable mean function; this includes the case of un- 
biased estimation. They are allowed to exploit the known sparsity of 
X. The RKHS framework has been previously proposed for a funda- 
mentally different problem of sparsity-exploiting estimation in |6|. 

Sparsity-exploiting estimation of C and of C^^ was considered 
recently in |7J and in [8], respectively. In both cases, the sparsity as- 
sumption was placed on C~\ which corresponds to a sparse graph- 
ical model for s. Our SCM approach JS} is clearly different: 
while the coefficient vector x is assumed sparse, the matrices C or 
need not be sparse. 

This paper is organized as follows. In Section (2] we review 
minimum-variance estimation and the RKHS framework. In Sec- 
tion[3] we use RKHS theory to derive lower variance bounds for the 
SCM. The special case of unbiased estimation is considered in Sec- 
tion |4] Finally, Section |5] presents a numerical comparison of our 
bounds with the variance of two established estimation schemes. 

2. RKHS FORMULATION OF MINIMUM- VARIANCE 
ESTIMATION 

2.1. Minimum- Variance Estimation 

The estimation error incurred by an estimator z(y) of z = g(x) 
can be quantified by the mean squared error (MSE) £(z(-),x) = 
Ex{||z(y) — zllj}, where the notation Ex{-} indicates that the ex- 
pectation is taken with respect to the pdf /(y; x) parameterized by 
X. According to our assumptions in Section[T] 



/(y;x) 



exp(-iy^C-^(x)y) 



[(2^)^-Mct{C(x)}] 



1/2 



with C(x) = C(x) + (T^I. 

(4) 



Let Zk and Zkiy) denote the fcth entries of z and z(y), respec- 
tively. We have e(z(-),x) = J^^^^ e(zfe(-), x), where e(£fc(-), x) 
= Ex{[2fc(y) — Zfc]^} denotes the fcth component MSE. For our 
scope, minimization of £(z(-), x) with respect to z(-) is equivalent 
to separate minimization of each component MSE e(ffe(-), x) with 
respect to 2fe(-). We furthermore have 



e(5fc(-),x) = h (2fe(-),x) + «(5fc(-),x) ^ 



(5) 



with the component bias 6(ifc(-),x) = Ex{Sfc(y)} — Zk and the 
component variance w(5fc(-),x) = Ex{[zfc(y) - Ex{zfc(y)}]^}. A 



common approach to defining a "locally optimal" estimator ffc() 
is to require &(2fc(-),x) — Cfc(x) for all x £ Xs,+, with a given 
bias function Cft(x), and look for estimators that minimize the vari- 
ance «(Sfe(-), x) at a given parameter vector x = xo £ Xs,+. It fol- 
lows from l|5) that once the bias is fixed, minimizing v{zk{-),xo) is 
equivalent to minimizing e{zk{-); xo). Furthermore, fixing the bias 
is equivalent to fixing the mean, i.e., requiring that Ex{5fc(y)} = 
7fe(x) for all x e Xs,+, where 7fe(x) = Cfe(x) + gfe(x). 

In what follows, we consider a fixed component k and drop the 
subscript k for better readability. Furthermore, we consider a given 
mean function 7(x) (short for 7fc(x)) and a given nominal param- 
eter vector xq. We are interested in the minimum variance at xo 
achievable by estimators z{-) (short for ifc(-)) that have mean func- 
tion 7(x) for all x € Xs,+. In order to derive a lower bound on 
this achievable variance, let us consider some subset "D C Xs,+. We 
denote by S^(xo) the set of all scalar estimators z{-) whose mean 
equals 7(x) for all x G O (however, not necessarily for all x £ Xs,+) 
and whose variance at xo is finite, i.e., 

S^(xo) ^ {5(-)|Ex{5(y)} = 7(x) Vx€D,«(z(-),xo)<oo}. 

If B^(xo) is nonempty, we consider the minimum variance achiev- 
able at the given parameter vector xo by estimators z{-)g B!^(xo): 

I/i?(xo) = min v{z{-),-x.o) ■ (6) 
z{.)ee^{xo) 

The use of min (rather than inf) in ^ is justified by the fact that the 
existence of a finite minimum can always be guaranteed by a proper 
choice of D; a sufficient condition will be provided in Section |T2l 

Because V C Xs.+, I/^(xo) is a lower bound on the variance at 
Xo of any estimator z{-) whose mean is 7(x) for all x £ Xs,+ (and 
not just for all x £ D), i.e., 

L!^(xo) < v{z{-),yio) , for any £(■) such that 

Ex{f (y)} = 7(x) y^eXs,+ . (7) 

2.2. RKHS Formulation 

An inner product of two real random variables a — a(y), b = 6(y) 
can be defined as {a,b)g^ = Exo{a(y)6(y)}, with induced norm 
^IIrv = V '^)rv = \/Exo{a2(y)}. Note the dependence on 
Xq. One can show that ^ can be rewritten formally as the following 
constrained norm-minimization problem: 

L^(xo) = min||i||^v -7^(xo) 

subject to (i, px)p.v = 7(x) Vx£2?. (8) 

Furthermore, if B^(xo) is nonempty, the existence of a finite mini- 
mum in l[8j can be guaranteed by choosing V such that OlSl 

IIPxIIrv = Ex„{Px(y)} < Vx£2?, (9) 

where 

Px(y) ^ . (10) 

According to |4|, the solutions of ((Sj can be described using an 
RKHS niR) with kernel i?(xi, X2) -.VxV^R given by 

-R(X1,X2) = (Pxi,Px2>Rv = Exo{Pxi(y)Px2(y)}. (11) 

Note that J?(xi, X2) and 'H{R) depend on xo. Inserting UOt and Q 
into Jilt yields the expression 

i?(xi,X2) = [det{C(xo)}]'^'[det{C(xi)C(x2) 

■ (C-^(xi) + 0-^x2) - C-^(xo))}]"'/', (12) 



where as before C(x) = C{x.) + a^I. The RKHS KiR) is a Hilbert 
space of functions / : !> ^ R that is defined as the closure of the 
linear span of the set of functions {/x'(x) = J?(x, x')}^,gj, This 
closure is taken with respect to the topology that is given by the 
inner product {■ , ■)-u(r) defined via the reproducing property (9) 

</(■), x')>„,^) =/(x'), feH{R), x'£0. 

The induced norm is = ^{f, f)n{R)- 

It can be shown |4,9| that if V satisfies ([9](, then 7£H(-R) is nec- 
essary and sufficient for B^(xo) to be nonempty and the minimum 
value (xo) in l[8j to exist and be given by 

-^-7 (xo) = Il7ll«(fl) -7^(xo). (13) 

3. LOWER BOUNDS ON THE ESTIMATOR VARIANCE 

According to J13t . any lower bound on ||7||^(fl) entails a lower 
bound on ^^.(xo). For mathematical tractability, we hereafter as- 
sume that the basis matrices Cfe in ^ are projection matrices on 
orthogonal subspaces of R*^. Thus, they can be written as 

Cfc = ^u^,,u^^^ , fc = l,...,iV, (14) 

1=1 

where {um}„^i is an orthonormal basis for R**^ and the sets 
Uk — {umfc are disjoint, so that they span orthogonal 

subspaces of R . We note that ^ and ( I14t correspond to a latent 
variable model s = with = Y^ti 5™*,,. Um^.i- where 

the ^mfc i are independent zero-mean Gaussian with variance Xk for 
all i, i.e., ^m^, ^ ~ A/'(0, Xk). This is similar to the latent variable 
model used in probabilistic principal component analysis 1 10 1 except 
that our "factors" u„i are fixed. With l ll4b . the kernel expression in 
( I12l ( simplifies to 

iV 

-R(X1,X2) = — — 

n [{Xo.k + '^'^Y - {Xl,k - Xo^k){x2.k~ Xo^k)Y'''^ 
fe=l 

where, e.g., xo.fc denotes the fcth entry of xo. We will refer to the 
SCM with basis matrices Cfe of the form l ll4t as the sparse diagonal- 
izable covariance model (SDCM)[] It can be shown that, within the 
SDCM, a sufficient condition for (O — and, thus, for the existence of 
a minimum in ([Sjl — is Xk < 2xo,k + cr^ for all fc £ {1, . . . , A'^}. 
Therefore, we choose our domain as 

V = {xeXs.+ \xk<2xo,k+a^ yk€{l,...,N}} . 

Note that V depends on xq. 

We will now derive a lower bound on ||7||^(^) for the SDCM. 
Let us assume for the moment that 7 £ 'H{R). Consider L functions 
w;(x), I = 1, . . . , L, with : D ^ R and vi £ 'H(-R), which are 
orthogonal, i.e., {vi, = if Z 7^ /'. Let V denote the sub- 

space of 'H{R) spanned by the vi, and Py the orthogonal projection 
operator on V. Clearly, a lower bound on ||7||^(jj) is given by 

l|Pv7llLm < hllLm- (15) 



'indeed, for the SDCM, the covariance matrix C(x) can be diagonalized 
by a signal transformation s' = Us, with a unitary matrix U that does not 
depend on the true parameter vector x. 



This lower bound can be expressed as 



4. SPECIAL CASE: UNBIASED ESTIMATION 



l|Pv7l 



H(R) 



E 



K7,"i)H(fl)l' 



(16) 



n(R) 



A convenient construction of functions vi (x) is via partial deriva- 
tives of i?(xi, X2) with respect to X2 |4|. Consider an index set IC 
containing exactly S indices from {1, . . . , N}, i.e., /C C {1, . . . , A''} 
and \JC\ = S. Furthermore let p; = {pi,i, . . . ,Pi,n) £ Nq' be L dif- 
ferent multi-indices satisfying supp(pi) C IC. We then define 



, , ^ ap'i?(x,x2) 



where 



8P'/(x) A 

axP! ^ 



rifeLi ^ pfk ) /(x) 3nd Xq' is obtained from xo 



/ = 1, 



(17) 



by zeroing all entries except those whose indices are in JC. It can be 
verified that the functions vi are orthogonal, i.e., 

{vuvi')u{R) = <1i{'^q)Si,i' , (18) 

where qi(xo) — 8''' 9''' ^(^i.xg) Furthermore l4l . 

a"' /(x) 



XI —Xn — xX 



9xPi 



for any f eH{R) . (19) 



Using (nil and ODl in Gill, we obtain 



ap'7(x) 



9XP! 



(20) 



Finally, combining (|7), ( I13t , dlSl l. and J20b . we arrive at the follow- 
ing bound. (Hereafter, we again explicitly indicate the index k.) 

Theorem 3.1. For the SDCM, let Sfc(-) be any estimator of = 
(7fe(x) whose mean equals jki^) for all x G Xs,+ and whose vari- 
ance at a fixed xq G Xs,+ is finite. Then, this variance satisfies 



■y(zfc(-),xo) > J2 



1 9i(xo) 



axPi 



7fe(xo), (21) 



for any choice of L different p; £ Nq' iwc/; that supp(pi) C IC, 
where IC C {!,..., TV} is an arbitrary set of S different indices. 
The lower bound | |2U is achieved by an estimator Zfc(-) if and only 
if there are nonrandom coefficients a; £ R such that 



axPi 



W!f/i f/ie random variables Px(y) being defined in JlOl l. 

Note that the bound in l l21b depends on 7fc(x) only via a finite 
number of partial derivatives of 7fe(x) at x = Xq". Thus, it only 
depends on the local behavior of the prescribed mean or bias. We 
furthermore note that Theorem 1 3 . 1 1 does not mention the condition 
7fe G 'H(-R) we used in its derivation. This is no problem because it 
can be shown |4| that if 7^ ^ 'H{R), there exists no estimator that 
has mean 7fc (x) for all x G Xs.+ and finite variance at xq. 



In this section, we evaluate the bound i2H for the important spe- 
cial case of unbiased estimation of x, i.e., for zt ~ gk{^) ~ xt and 
Cfe(x) = or equivalently 7fe(x) = Xk. To obtain a simple expres- 
sion, we use L = 2 and particular choices of IC and p; (I — 1, 2). 
Specifically, using K, — {k} U £, where £ consists of the indices 
of the S — 1 largest entries of the vector that is obtained from xq by 
zeroing the fcth entry, and pi = and p2 = efe, where efe denotes 
the kth column of the identity matrix, the following variance bound 
is obtained from Theoreml3.1l 



Corollary 4.1. For the SDCM, let Xk(-) be any estimator of Xk that 
is unbiased (i.e., jici^) ~ Xk)for all x. £ Xs,+ and whose variance 
at a fixed Xq G Xs,+ is finite. Then, this variance satisfies 



v{xk{-),xo) 
2 



2 a^m^o) + a'f-e{^o)Y'"^' 



Tk 



(e(xo) + a2)''^« 



k G supp(xo) 



, k ^ supp(x()) 



(22) 

where ^(xo), Jo denote the value and index respectively of the S- 
largest entry ofxo. 

The lower bound i22i can be achieved at least in the following 
two cases: (i) if G supp(xo), and (ii) for any k G {1, . . . , A'^} if 
||xo||q < S (note that this condition implies C(xo) = 0). In both 
cases, the estimator given by 



*fe(y) = A(y) ■ 



with /3fe(y)4lV(u: 

Tk 



(23) 

is unbiased and its variance achieves the bound l l22t . This estimator 
does not use the sparsity information and does not depend on xq. 

Let us define a "signal-to-noise ratio" (SNR) quantity as SNR = 
^(xo)/cr^. For SNR(xo) 1, the lower bound C2\ is approximately 
-^{xQ,k + cr^)^ for any fc, which does not depend on 5* and more- 
over equals the variance of the unbiased estimator l |23l l. Since that 
estimator does not exploit any sparsity information. Corollary 14.11 
suggests that, in the low-SNR regime, unbiased estimators cannot 
exploit the prior information that x is S-sparse. However, in the 
high-SNR regime (SNR(xo) 00), ([22} becomes — (xo.fc + a^)^ 
for k G supp(xo) and for k supp(xo), which can be shown 
to equal the variance of the oracle estimator that knows supp(xo) 
(this oracle estimator yields = xo^k = for all k ^ supp(xo)). 
The transition of the lower bound i22\ from the low-SNR regime 
to the high-SNR regime has a polynomial characteristic; it is thus 
much slower than the exponential transition of an analogous lower 
bound recently derived in 1 6 1 for the sparse linear model. This slow 
transition suggests that the optimal estimator for low SNR — which 
ignores the sparsity information — will also be nearly optimal over a 
relatively wide SNR range. This further suggests that, for covariance 
estimation based on the SDCM, prior information of sparsity is not 
as helpful as for estimating the mean of a Gaussian random vector 
based on the sparse linear model |6|. 

In the special case where S =1 and xq 7^ 0, let ^0 and jo denote, 
respectively, the value and index of the single nonzero entry of xo G 
X\^+. Consider the estimator x'""' (■) given componentwise by 



A(y)- 



k = jo 



"(y;xo)(A(y)-cr^) , fc/io, 



(24) 



where a(y;xo) = a(xo)exp( 
and 6(xo) = 



rjo&(xo)/3jo(y)) with a(xo) = 
One can show 



using RKHS theory that yS^°\-) is unbiased and has the minimum 
variance achievable by unbiased estimators at any xo G <-fi,+ with 
xo 7^ 0. Note that this estimator depends explicitly on the assumed 
Xo, at which it achieves minimum variance; its performance may be 
poor when the true parameter vector x is different from xq. 

5. NUMERICAL RESULTS 

We compare the lower bound MW for g(x) = x with the variance 
of two standard estimators. The first is an ad-hoc adaptation of the 
hard-thresholding (HT) estimator 1 1 1 1 to SDCM-based covariance 
estimation. It is defined componentwise as (cf. ( |23t ) 



XkMI 



where (^t : R ^ K denotes the hard-thresholding function with 
threshold r > 0, i.e., ipriy) is y for \y\ > r and else. The second 
standard method is the maximum likelihood (ML) estimator 

XML(y) = argmax/(y;x') . 
For the SDCM, one can show that 



ifc,ML(y) = 



/3fe(y)- 
0, 



fcG£in£2 

else, 



where £i consists of the S indices k for which [/3fc(y)/(T^ — 
ln(/3fc (y)/o-^) — l] (with In = log^) is largest, and £2 consists of all 
indices k for which /3fc (y) > a^. 

For a numerical evaluation, we considered the SDCM with N= 5, 
5 = 1, (T^ = 1, and Cfe = eke[. We generated parameter vectors 
Xo with jo = 1 and different ^o- In Fig. [T] we show the variance 
at xo, u(x(-),xo) = Y^k=i'"i^k{-),xo) (computed by means of 
numerical integration), for the HT estimator using various choices 
of T and for the ML estimator The variance is plotted versus 
SNR — ^(xo)/(T^ = Co/f^- Along with each variance curve, we 
display a corresponding lower bound that was calculated by evalu- 
ating i2ll for each k, using for 7/.. (x) the mean function of the re- 
spective estimator (HT or ML), and summing over all k. (The mean 
functions of the HT and ML estimators were computed by means of 
numerical integration.) In evaluating i2ll . we used partial deriva- 
tives of order at most 1 in l ll7t . and we chose for the evaluation of 
the lower bound L = 2, A3 = {k}, pi = 0, and p2 = e^. In Fig.[T] all 
variances and bounds are normalized by 2(^o + o"^)^, which is the 
variance of the oracle estimator knowing jo • 

It can be seen from Fig.[T]that in the high-SNR regime, for both es- 
timators, the gap between the variance and the corresponding lower 
bound is quite small. This indicates that the performance of both 
estimators is nearly optimal. However, in the low-SNR regime, the 
variances of the estimators tend to be significantly higher than the 
bounds. This means that there may be estimators with the same bias 
and mean function as that of the HT or ML estimator but a lower 
variance. However, the actual existence of such estimators is not 
shown by our analysis. 

6. CONCLUSION 

We considered estimation of (a function of) a sparse vector x that 
determines the covariance matrix of a Gaussian random vector via a 




i'(xml(');xo) 

bound on v{x.ml{-); xo) 

!;(xht(');xo), r = 3 

■O'— bound on v{x.m{-):, xo), t = 3 

ii(xht(-);xo), r = 6 

<■•<■<■ bound on l'(xht(-); ^o). t = 6 

!;(xht(-);xo), r = 9 

bound on v(xht(-)! ^o), t = 9 



SNR [dB] 

Fig. 1. Normalized variance of the HT and ML estimators and cor- 
responding lower bounds versus SNR = C(xo)/cr^, for the SDCM 
with AT = 5, 5 = 1, cr^ = 1, and Cfe = Gfee^. 



parametric covariance model. Using RKHS theory, we derived lower 
bounds on the estimator variance for a prescribed bias and mean 
function. For the important special case of unbiased estimators of 
X, we found that the transition of our bounds from low to high SNR 
is significantly slower than that of analogous bounds for the sparse 
linear model | 6|. This suggests that the prior information of sparsity 
is not as helpful as for the sparse linear model. Numerical results 
showed that for low SNR, the variance of two standard estimators 
(hard-thresholding estimator and maximum likelihood estimator) is 
significantly higher than our bounds. Hence, there might exist esti- 
mators that have the same bias and mean function as these standard 
estimators but a smaller variance. 
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